MMCacheSim: A Highly Configurable Matrix Multiplication Cache Simulator
نویسندگان
چکیده
Memory access is the bottleneck of all computations. CPU cache is introduced to speed up accessing reused and local data. Matrix multiplication is the most common representative of many linear algebra algorithms which performance directly depends of the cache. Many cache parameters exist and impact the overall computing performance such as cache type, line, size, level, associativity, and replacement policy. Therefore an optimal architecture to execute certain compute and memory intensive algorithm is desirable in most applications. We have developed MMCacheSim simulator to predict matrix multiplication performance on particular existing or non-existing multiprocessor. MMCacheSim simulates the execution time and number of cache misses that matrix multiplication algorithm performs with particular matrix size and element size executing on processor with different cache size, line, level associativity, and replacement policy.
منابع مشابه
Accelerating Blocked Matrix-Matrix Multiplication using a Software-Managed Memory Hierarchy with DMA
The optimization of matrix-matrix multiplication (MMM) performance has been well studied on general-purpose desktop and server processors. Classic solutions exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. Typical digital signal processors (DSPs) do not have these features, and instead use in-order execu...
متن کاملMoola: Multicore Cache Simulator
Chip multiprocessors have become the normative architecture for medium and high performance processors. These devices introduce new questions and research topics. One such topic is exploring the design space of a cachememory hierarchy that prevents the memory accesses from being a limiting factor on system performance. Simulation of system workloads is a widely accepted method for evaluating pr...
متن کاملHardware-software co-simulation of bus-based reconfigurable systems
One of the most flexible and modular approaches to reconfigurable systems is a bus-based approach. In order to get realistic performance estimates of these systems, detailed modeling of the processor as well as the bus and memory hierarchy is required. In addition, when coupling one or more reconfigurable units with a superscalar, out-of-order issue, load/store RISC CPU using the on-chip system...
متن کاملAdaptive Matrix Multiplication in Heterogeneous Environments
In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is developed and evaluated. Unlike the state-of-the-art approaches, where load balancing is achieved through unequal distribution of the matrix data among the heterogeneous nodes, the matrices in our approach are partitioned into blocks of equal size. Task allocation and the block size are adapted ...
متن کاملOptimizing Matrix-matrix Multiplication for an Embedded Vliw Processor
The optimization of matrix-matrix multiplication (MMM) performance has been well studied on conventional general-purpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. However, the microarchitectur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012